perf: cache or index TestSuiteGate relevant-test discovery by shrutu0929 · Pull Request #143 · Refactron-ai/Refactron_lib

shrutu0929 · 2026-04-03T16:55:21Z

solve #139
The recent commit introduces a performance optimization for test file discovery within the TestSuiteGate._find_relevant_tests method in order to reduce latency in large repositories. We implemented an in-memory caching mechanism that maps either the project_root or search_root to specific target modules, storing their uniquely associated test files. When a test discovery is requested, the system first consults this cache to immediately yield previously resolved paths. If the cache is unpopulated, it performs the standard file scan but with an added refinement: it intelligently skips scanning obvious non-target directories (such as virtual environments, .pytest_cache, and site-packages) to further minimize disk I/O. The parsed and verified test paths are then saved to the cache, eliminating the overhead of continuously re-reading and parsing the Abstract Syntax Tree (AST) for unchanged code on subsequent executions.

Summary by CodeRabbit

New Features
- Introduced caching for test discovery results to boost performance and reduce scan times
- Improved test file identification through comprehensive scanning and import analysis

coderabbitai · 2026-04-03T16:57:01Z

📝 Walkthrough

Walkthrough

Added a module-level test discovery cache and implemented a new TestSuiteGate with logic to recursively scan for test files, filter excluded directories, and parse AST to detect module imports. Constructor signature updated to accept search_root parameter.

Changes

Cohort / File(s)	Summary
Test Discovery Infrastructure `refactron/verification/checks/test_gate.py`	Added `_TEST_DISCOVERY_CACHE` global for caching discovered test files. Implemented test discovery with directory filtering (venv, site-packages, .git, __pycache__), AST-based import detection, and helper methods `_imports_module()` and `_find_relevant_tests()`. Updated `TestSuiteGate.__init__()` signature to accept `search_root` parameter.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

perf: cache or index TestSuiteGate relevant-test discovery #139 — Directly addresses optimized test discovery with caching and improved directory filtering behavior through modifications to _find_relevant_tests() and _imports_module() functions.

Poem

🐰 A cache of tests, now swift and keen,
Scanning paths where none were seen,
AST whispers of imports true,
No venv clutter blocks my view! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: adding caching/indexing to TestSuiteGate's test discovery for performance improvement.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

refactron/verification/checks/test_gate.py (1)
73-80: 🛠️ Refactor suggestion | 🟠 Major

Integration guidance: Merge caching into the existing class.

To properly implement this PR's caching optimization:

Keep the existing TestSuiteGate(BaseCheck) class as the single definition

Add the _TEST_DISCOVERY_CACHE at module level (after existing imports)

Modify the existing _find_relevant_tests method (line 166) to consult/update the cache

Add the ignore_patterns filtering to the existing method

Keep the existing _imports_module exact-matching logic (line 186)

This preserves API compatibility and maintains the BaseCheck contract.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` around lines 73 - 80, Keep the
single TestSuiteGate(BaseCheck) class and add a module-level cache named
_TEST_DISCOVERY_CACHE (defined after imports). In the existing
TestSuiteGate._find_relevant_tests method, consult _TEST_DISCOVERY_CACHE keyed
by relevant state (e.g., changed file path or module name) to return cached test
lists when present, and update the cache after discovering tests. Also apply
ignore_patterns filtering inside _find_relevant_tests before returning/updating
the cache. Do not change the exact-matching logic in
TestSuiteGate._imports_module; leave that method as-is.

🧹 Nitpick comments (1)

refactron/verification/checks/test_gate.py (1)
53-58: Cache stores Path objects without invalidation strategy.

The cache stores Path objects that could become stale if test files are added, modified, or deleted between runs. Consider:

Storing modification timestamps and checking for staleness

Adding a TTL-based expiration

Providing a clear_cache() function for explicit invalidation
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/test_gate.py` around lines 53 - 58, The cache
(_TEST_DISCOVERY_CACHE) currently stores Path objects for a
cache_key/target_module pair and can become stale; modify the logic around the
caching in the function that writes/reads _TEST_DISCOVERY_CACHE so that before
returning cached relevant_tests you validate freshness—store file mtimes (or a
hash) alongside the Path entries and compare current mtimes to detect changes,
or attach a TTL timestamp and expire entries when older than the TTL;
additionally add a public clear_cache() helper that empties
_TEST_DISCOVERY_CACHE for explicit invalidation and ensure the function that
populates the cache updates the stored metadata whenever relevant_tests are
recalculated.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@refactron/verification/checks/test_gate.py`:
- Around line 42-51: The loop scanning files via search_root.rglob("*.py")
incorrectly applies the ignore_patterns only when project_root is set and only
recognizes test files with a "test_" prefix; update the logic so the ignore
check (any(part in ignore_patterns for part in p.parts)) runs unconditionally
for every candidate from search_root.rglob, and broaden the test-file detection
to accept either p.name.startswith("test_") or p.name.endswith("_test.py"); keep
the existing p.is_file() and _imports_module(p, target_module) checks and append
matching paths to relevant_tests as before.
- Around line 21-27: The import-check currently uses substring matching
(target_module in alias.name and target_module in node.module) which causes
false positives; update the checks in the ast.Import and ast.ImportFrom branches
to use exact-module or submodule matching instead: replace "target_module in
alias.name" with "alias.name == target_module or
alias.name.startswith(target_module + '.')", and replace "target_module in
node.module" with "node.module == target_module or
node.module.startswith(target_module + '.')", keeping the rest of the logic
around ast.Import, ast.ImportFrom, alias.name and node.module unchanged.
- Around line 10-14: Remove the duplicate TestSuiteGate class and merge its
caching behavior into the existing TestSuiteGate that inherits from BaseCheck;
specifically, delete the standalone class definition that defines __init__(self,
search_root: Path, project_root: Optional[Path] = None) and instead incorporate
any cache fields and logic into the existing TestSuiteGate(BaseCheck)
implementation (keep its original constructor signature and API), update its
__init__ to initialize cache using search_root/project_root values without
changing parameters expected by callers, and ensure the existing verify(self,
...) method implements the new caching checks so the class continues to satisfy
the BaseCheck contract used by verification/engine.py.
- Line 59: The module docstring string literal """TestSuiteGate — Check 3: run
relevant tests against transformed code.""" is not at the top of the file; move
that docstring to be the very first statement in the module (before any imports)
and remove the unintended prepended lines that pushed it down (the stray content
added above the imports), ensuring the file begins with the docstring and then
the imports and definitions (e.g., classes/functions in this module such as
TestSuiteGate or any test-related functions).
- Around line 28-30: The code currently swallows all exceptions in the except
Exception: pass block (returning False), which hides parsing/encoding errors;
change the handler to except Exception as e: and log the exception (for example
using logging.exception(...) or a module-level logger.exception(...)) before
returning False so errors during AST parsing in test_gate.py are visible; ensure
the logging call includes context (e.g., "failed to evaluate test gate" and the
exception).
- Around line 1-8: The new caching code was prepended and created duplicate
imports and a second TestSuiteGate definition; remove the duplicated top-level
imports and the extra TestSuiteGate class and integrate the caching logic into
the existing TestSuiteGate class (use the class's existing methods that discover
tests to consult/update the cache named _TEST_DISCOVERY_CACHE rather than a
separate block). Replace the unbounded module-level dict by using
functools.lru_cache or a bounded eviction structure (or wrap cache accesses with
a max-size policy) and ensure all references use the same _TEST_DISCOVERY_CACHE
symbol inside TestSuiteGate so there is a single canonical implementation.

---

Outside diff comments:
In `@refactron/verification/checks/test_gate.py`:
- Around line 73-80: Keep the single TestSuiteGate(BaseCheck) class and add a
module-level cache named _TEST_DISCOVERY_CACHE (defined after imports). In the
existing TestSuiteGate._find_relevant_tests method, consult
_TEST_DISCOVERY_CACHE keyed by relevant state (e.g., changed file path or module
name) to return cached test lists when present, and update the cache after
discovering tests. Also apply ignore_patterns filtering inside
_find_relevant_tests before returning/updating the cache. Do not change the
exact-matching logic in TestSuiteGate._imports_module; leave that method as-is.

---

Nitpick comments:
In `@refactron/verification/checks/test_gate.py`:
- Around line 53-58: The cache (_TEST_DISCOVERY_CACHE) currently stores Path
objects for a cache_key/target_module pair and can become stale; modify the
logic around the caching in the function that writes/reads _TEST_DISCOVERY_CACHE
so that before returning cached relevant_tests you validate freshness—store file
mtimes (or a hash) alongside the Path entries and compare current mtimes to
detect changes, or attach a TTL timestamp and expire entries when older than the
TTL; additionally add a public clear_cache() helper that empties
_TEST_DISCOVERY_CACHE for explicit invalidation and ensure the function that
populates the cache updates the stored metadata whenever relevant_tests are
recalculated.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9b4e40d5-cc44-4733-94a6-658cc1b005ec

📥 Commits

Reviewing files that changed from the base of the PR and between c89cfdb and 03af1c7.

📒 Files selected for processing (1)

refactron/verification/checks/test_gate.py

coderabbitai · 2026-04-03T17:00:10Z